Skip to content

Add configurable MoE runtime flags to MegatronConfig#1213

Draft
tyler-griggs wants to merge 2 commits intomainfrom
tgriggs/moe-config-fields
Draft

Add configurable MoE runtime flags to MegatronConfig#1213
tyler-griggs wants to merge 2 commits intomainfrom
tgriggs/moe-config-fields

Conversation

@tyler-griggs
Copy link
Member

@tyler-griggs tyler-griggs commented Feb 25, 2026

Summary

Exposes 5 MoE runtime config flags as first-class fields on the MegatronConfig dataclass, replacing 2 hardcoded values in the Megatron worker.

New fields

Field Type Default Purpose
moe_token_dispatcher_type str "alltoall" Expert dispatch strategy (was hardcoded)
moe_router_load_balancing_type str "none" Load balancing loss type (was hardcoded)
moe_grouped_gemm bool False Fused grouped GEMM for MoE
moe_router_score_function Optional[str] None Router scoring (e.g. "sigmoid" for GLM/DeepSeek-V3)
moe_router_enable_expert_bias Optional[bool] None Learned expert bias for load balancing
  • Most architecture flags (num_experts, topk, qkv_bias, rotary, moe_layer_freq) are auto-detected by AutoBridge from the HF config.json — no explicit config needed
  • These 5 fields are specifically for runtime behavior not in the HF config

Open with Devin

Expose 5 MoE runtime config flags as first-class MegatronConfig fields:
- moe_token_dispatcher_type (replaces hardcoded "alltoall")
- moe_router_load_balancing_type (replaces hardcoded "none")
- moe_grouped_gemm (enables fused grouped GEMM for MoE)
- moe_router_score_function (e.g. "sigmoid" for GLM/DeepSeek-V3)
- moe_router_enable_expert_bias (learned bias for load balancing)

Most architecture flags (num_experts, topk, qkv_bias, rotary, etc.)
are auto-detected by AutoBridge from the HF config.json. These 5 flags
control runtime behavior that is NOT in the HF config.

Advanced/model-specific flags can still be passed through the existing
transformer_config_kwargs dict, which is applied after these fields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly exposes several Mixture-of-Experts (MoE) runtime flags in MegatronConfig, replacing previous hardcoded values. This improves configurability. The changes are well-implemented across the configuration files, worker logic, and are accompanied by new tests.

My review includes a couple of suggestions for improvement:

  • In megatron_worker.py, I've suggested a small refactoring to make the application of optional MoE configurations more concise and maintainable.
  • In the new test file test_moe_config.py, I've pointed out a test case that could be made more exhaustive by adding assertions for all the fields being tested.

Overall, this is a solid contribution that enhances the flexibility of MoE configurations.

Comment on lines +244 to +247
if megatron_config.moe_router_score_function is not None:
provider.moe_router_score_function = megatron_config.moe_router_score_function
if megatron_config.moe_router_enable_expert_bias is not None:
provider.moe_router_enable_expert_bias = megatron_config.moe_router_enable_expert_bias
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability and reduce repetition, you can loop over the optional MoE configuration fields. This makes it easier to add more optional fields in the future.

Suggested change
if megatron_config.moe_router_score_function is not None:
provider.moe_router_score_function = megatron_config.moe_router_score_function
if megatron_config.moe_router_enable_expert_bias is not None:
provider.moe_router_enable_expert_bias = megatron_config.moe_router_enable_expert_bias
for field_name in ("moe_router_score_function", "moe_router_enable_expert_bias"):
value = getattr(megatron_config, field_name)
if value is not None:
setattr(provider, field_name, value)

Comment on lines +49 to +51
assert cfg.moe_grouped_gemm is True
assert cfg.moe_router_score_function == "sigmoid"
assert cfg.moe_router_enable_expert_bias is True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test is not exhaustive. It's good practice to assert all fields that are being set from the dictionary to ensure the build_nested_dataclass function works as expected for all new fields. The assertions for moe_token_dispatcher_type and moe_router_load_balancing_type are missing.

Suggested change
assert cfg.moe_grouped_gemm is True
assert cfg.moe_router_score_function == "sigmoid"
assert cfg.moe_router_enable_expert_bias is True
assert cfg.moe_token_dispatcher_type == "alltoall"
assert cfg.moe_router_load_balancing_type == "none"
assert cfg.moe_grouped_gemm is True
assert cfg.moe_router_score_function == "sigmoid"
assert cfg.moe_router_enable_expert_bias is True

@tyler-griggs tyler-griggs marked this pull request as draft February 26, 2026 00:46
…oad_balancing_type in test_moe_config_from_dict

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant